DISSERTATION Fault-Tolerant Distributed Algorithms for On-Chip Tick Generation: Concepts, Implementations and Evaluations
نویسنده
چکیده
In the course of this thesis a novel approach for the on-chip generation of a fault-tolerant clock is developed. At first this is motivated by the fact that with shrinking feature sizes and the accompanying increase of transient failure rates it is more and more desirable to provide VLSI (Very Large Scale Integration) circuits that incorporate mechanisms for fault tolerance. In particular, the conducted research concentrates on the most prominent single point of failure of modern chip design, namely, the clock signal of synchronous circuits. After surveying alternative design approaches and existing schemes for achieving fault tolerance a novel fault-tolerant clocking scheme is introduced. The proposed clock generation method is based on the hardware implementation of a well known distributed clock synchronization algorithm. Most notably, it provides scalable fault tolerance for up to f arbitrary (Byzantine) failures in a system of n ≥ 3f + 2 tick generation nodes. Additionally, the clocking scheme’s operation does not rely on the synchronization of clock sources, like quartz oscillators; in fact, the distributed clock signals are generated in a synchronized way. This unique property relieves the design from metastability issues at clock boundaries. The transformation of the original software-based algorithm to the peculiarities of chip design proved to be an intricate task. Therefore, the major part of the work deals with the design and development process of the algorithm’s hardware equivalent finally resulting in a fully operational VLSI chip design. To assess the properties of the novel fault-tolerant clocking approach and to show its feasibility exhaustive evaluations have been performed. The presented assessments aim at a thorough characterization of (i) the developed chip design and (ii) the distributed clock generation scheme on which these chips are based. Additionally, the conducted measurements allowed to validate worst-case measures which were derived in advance from the formal analysis of the clocking approach. In order to attain a more comprehensive characterization of the design, the presented worst-case evaluations have been supported by measurements and simulations for typical operating scenarios. The presented work concludes with a short summary and a brief treatment of the most notable topics for ongoing and future research.
منابع مشابه
Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)
Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...
متن کاملFault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation
Today’s hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with ...
متن کاملVLSI Implementation of a Distributed Algorithm for Fault-Tolerant Clock Generation
We present a novel approach for the on-chip generation of a fault-tolerant clock. Our method is based on the hardware implementation of a tick synchronization algorithm from the distributed systems community. We discuss the selection of an appropriate algorithm, present the refinement steps necessary to facilitate its efficient mapping to hardware, and elaborate on the key challenges we had to ...
متن کاملCAFT: Cost-aware and Fault-tolerant routing algorithm in 2D mesh Network-on-Chip
By increasing, the complexity of chips and the need to integrating more components into a chip has made network –on- chip known as an important infrastructure for network communications on the system, and is a good alternative to traditional ways and using the bus. By increasing the density of chips, the possibility of failure in the chip network increases and providing correction and fault tol...
متن کاملFault-Tolerant Distributed Algorithms on VLSI Chips
The Dagstuhl seminar 08371 on Fault-Tolerant Distributed Algorithms on VLSI Chips was devoted to exploring whether the wealth of existing fault-tolerant distributed algorithms research can be utilized for meeting the challenges of futuregeneration VLSI chips. Participants from both the distributed fault-tolerant algorithms community, interested in this emerging application domain, and from the ...
متن کامل